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ABSTRACT 


Provable data possession (PDP) provides mechanisms to efficiently audit the integrity of 
data held by third parties, like cloud service providers. While multiple PDP schemes have 
been proposed, there is no research to date that provides in-depth cost analysis for PDP. This 
research fills that gap by (1) collecting and analyzing cost data for four PDP schemes, (2) 
providing generic cost models (mathematical formulae expressing abstract models which 
can be used to infer future cost), and (3) comparing overall cost efficiency of each PDP 
scheme. For the schemes considered in this study, we find all have nearly identical costs 
in practice; however, sophisticated schemes designed with low communication complexity 
have higher preprocessing or storage costs which, depending on audit parameters, impact 
total scheme cost. We conclude that MAC-PDP and CPOR schemes are similar, whereas 
the cost of A-PDP becomes relatively expensive at large file sizes. Our basis cost projec¬ 
tions show tagging, storing and auditing a file for one year at one audit per hour is at least 
$160 for a 1 GB file, $170 for a 1 TB file, and $2,000 for a 1 PB file using a cost model 
based on the Amazon S3 service. 
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CHAPTER 1: 

Introduction 


The Department of Defense (DOD) has identified collaboration and improved access to 
information as key elements of future operational success. This need, coupled with the 
massive growth in data, has led the U.S. Navy and other DOD entities to invest in cloud 
storage capabilities in an effort to cope with “Big Data.” In 2014, Terry Halvorsen, then- 
Navy Chief Information Officer (CIO), stated the Navy needs to move about half of its 
unclassified data into commercial cloud storage [1]. In late 2014, as acting DOD CIO, 
Halvorsen released a memo freeing DOD agencies to procure their own commercial cloud 
services, without using Defense Information Systems Agency (DISA), in an effort to speed 
up the migration process [2]. 

With the growing use of cloud storage solutions, there is a corresponding need for secure 
and efficient means of guaranteeing data integrity and availability.The Federal Cloud Com¬ 
puting Strategy of 2011 states that agencies should explicitly state security, availability, and 
quality requirements through service level agreements, and routinely monitor vendor com¬ 
pliance [3]. The DOD Cloud Computing Strategy of 2012 also establishes the requirement 
for cloud services to provide sufficient security to ensure the integrity and availability of 
DOD information [4]. In 2015, the DOD released its Cloud Computing Security Require¬ 
ments Guide (SRG), which outlines the security requirements for DOD agencies procuring 
commercial cloud services. Among its recommendations are policies that would provide 
audit and accountability for data additions, deletions, and modifications [5]. Recent out¬ 
ages for well-known cloud storage providers, including Amazon S3 and Microsoft Azure, 
also underscore the need for a reliable and efficient auditing mechanism to ensure data 
availability and integrity as agencies migrate to the commercial cloud [6]-[8], 

Proof of data possession schemes may provide the best mechanism to fulfill these demands 
to actively track vendor compliance and assure the integrity of data in storage. Through 
the use of cryptographic protocols, proof of data possession (PDP) schemes provide prob¬ 
abilistic guarantees that data on storage servers has not been maliciously or inadvertently 
deleted or altered. They claim to provide this guarantee at low cost to both the proving 
and verifying entities. Its guarantees are probabilistic and its asymptotic costs are strictly 
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sublinear in file size. This technology has not yet been implemented by or for a commer¬ 
cial service; however, there has been substantial research in PDP and other data integrity 
schemes over the past decade [9]—[32]. 

While multiple PDP schemes have been proposed, each with varying degrees of efficiency 
and security, there is no research to date that provides in-depth cost analysis comparing the 
real-world efficiencies of PDP schemes. All prior research has focused on two aspects of 
PDP schemes: providing high probability guarantees of data possession (security) while 
minimizing the size of the challenge and response (communication complexity). These are 
important criteria, especially in bandwidth-constrained environments; however, to date, no 
research has provided comparisons of PDP schemes in terms of real-world costs (time to 
generate proof, time to verify proof, time to tag, cost to store tag overhead, cost to run an 
audit service, cost to service requests from an audit service, etc). 

Our research fills that gap by (1) collecting and analyzing cost data for four PDP schemes, 
(2) providing generic cost models (mathematical formulae expressing abstract models 
which can be used to infer future cost), and (3) comparing overall cost efficiency of each 
PDP scheme. Additionally, instead of measuring costs primarily in terms of the size of the 
query and response - a bandwidth concern - this research recognizes (a) the importance 
of processing time when evaluating the cost of a particular scheme, and (b) the asymmet¬ 
ric costs associated with some cloud cost models (e.g., PUTs are typically more expensive 
than GETs). 

Based on our generic cost models, we show that the basis costs to audit are nearly identical 
for MAC-PDP, A-PDP, and CPOR, but tag and storage costs are different enough to have 
a significant impact on total cost among the schemes. We also show that the total cost 
of MAC-PDP and CPOR are similar, but A-PDP becomes expensive relative to the other 
schemes at large file sizes, due to its higher tag and storage costs. We show that the total 
basis cost (up-front cost to tag and cumulative cost storing and auditing) for one year at one 
audit per hour of a 1 GB file is under $1 for MAC-PDP, A-PDP, and CPOR, but that cost 
ranges from $4,400 to $38,700 across schemes for a 1 PB file. 
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CHAPTER 2: 
Background 


This research focuses on four specific PDP schemes: a simple MAC-based PDP scheme 
(MAC-PDP), the scheme described by Ateniese, Burns, Curtmola, Herring, Kissner, Pe¬ 
terson and Song (A-PDP) [33], the scheme decsribed by Ateniese, Pietro, Mancini and 
Tsudik (SEPDP) [34] and the scheme described by Shacham and Waters (CPOR) [35]. 
Our research is primarily concerned with building accurate cost models for each scheme 
based on experimental audit data. Below, we provide a generic description of PDP and a 
description of each PDP scheme considered in our experiments. 


2.1 Proof of Data Possession 

A PDP system can be divided into two generic phases: the set-up phase and the challenge 
phase. In the set-up phase, a client generates a public and private key pair, tags the file, and 
uploads the file and tag data to storage, deleting it from local storage. During the challenge 
phase, the client generates a challenge for a specified number of file blocks and sends the 
challenge to the prover. The prover uses the challenge to generate a proof of possession, 
which is returned to the client. The client then validates the proof, providing a probabilistic 
guarantee that the prover does or does not possess the client’s file. 

Following the notation of Juels and Kaliski [36] and Bower, Juels, and Oprea [37], a file 
M can be divided into n blocks, M = (mi,m 2 , ...,m n ). We let P denote the prover (server), 
V denote the verifier (client), rj denote the file’s identifier, and a> denote local client state. 
We represent unspecified values with a ± symbol. A generic PDP scheme can be consid¬ 
ered a five-tuple of algorithms, (KeyGen,Tag, Challenge, Proof, Verify), each described as 
the following. 

KeyGen(l k ) —> ( pk,sk ). This algorithm is used by the client to generate random public 
and private keys by employing security parameter k. 

Tag (M; pk,sk,a>) —> M*. This algorithm is used by the client to process a file and pro¬ 
duce verification tag data. It takes as input a public and private key pair (pk, sk ) and 
file M. It generates a file ID ij and returns M*, the encoded file with verification tag 
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data. It also updates the client state oj to include and locally held data such as the file 
ID, file size, number of blocks, etc. The data M* can be stored remotely. 

Challenge^/; pk,sk,co) —> c. This algorithm is used by the client to produce a challenge 
c. This challenge will sent to the prover during an audit. 

Proof pk) —» p. This algorithm is used by the prover to demonstrate proof of 
possession of specified file blocks as a response to challenge c. It takes as input the 
remote, encoded data M* and challenge c, to generate proof p. 

Verify (c,p, ij;pk,sk,to ) —> b e {0, 1}. This algorithm is used by the client to validate the 
proof p. It takes as input the public and private key pair ( pk,sk ), challenge c and 
proof p. Upon successful validation it returns 1, else it returns 0. 


2.2 Constructions 

In this section, we provide detailed descriptions of each PDP scheme employed in our 
study: MAC-PDP, A-PDP, SEPDP and CPOR. 


2.2.1 MAC-PDP 

The MAC-PDP scheme is defined below, following the description and notation from 
Shacham and Waters [35] and Riebel [38], adapted slightly for uniformity with the other 
schemes in Section 2.2. 

Let / be a keyed pseudo-random function, as follows: 


/ : {0,1}* x K prf -> Z p 

KeyGen(l^) —> ( pk,sk ). Choose a random secret key for a hash-based MAC function 
kmac K pr f. The secret key is sk = (k mac ) and public key is pk =±. 

Tag (M;pk,sk,a>) —> M*. The file is split into n blocks, M = ( m\,m 2 ,...,m n ). Choose 
a random file ID ij, where rj e Z p . For each block m,, (1 < i < n), generate tag 
(Ti = MAC kmac (//|| mi). The data stored remotely is M* = (M , {cr,},<,<„}. 

Challeng e(ii;pk,sk,a>) —> c. Choose a random ^-element subset I c [l,n] of indices. 
Let c be the set {/} !e /. 

Proof pk) —» p. For each i e c, return to the verifier p = {(»/,-, cr ; )},- ec . 
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? 

Veniy(c,p,r];pk,sk,a)) —> b e {0,1}. For each i e c, check if ay = MAC kmac (q\\mi). If 
all l checks are correct then return b = 1, else return b = 0. 


2.2.2 A-PDP 

The A-PDP scheme is defined below, following the description and notation from Ateniese 
et al. [33], adapted slightly for uniformity with the other schemes in Section 2.2. 

Let if be a cryptographic hash function, h be a full-domain hash function, / be a pseudo¬ 
random function and n be a pseudo-random permutation (PRP) as follows (where k, £, A 
are security parameters): 

h : {0,1}* —> QRn ( QRn is the set of quadratic residues modulo N ) 

/:{0,l}*x{0,l} log2( " ) -» {0,1}* 
n : {0,1}* x {0, l} log2(n) {0, l} log2(n) 

KeyGen(l fc ) —> ( pk,sk ). Choose safe primes p,q, where p = 2 p' + 1 and q = 2 q' + 1. Let 
N = pq. Let g be a generator of QRn, the set of quadratic residues modulo N. Let 
v <— (0,1 } K . The public key pk = (N,g) and the secret key sk = (e,d,v), such that 
e is a large secret prime with ed = 1 (mod p'q'), e > A, d > A. 

Tag(M; pk,sk,a >) —> M*. The file is split into n blocks, M = (m\,m 2 ,...,m n ). For each 
block mu compute T imj = (h(Wj) • g m ) d mod N, where W, = v\\i. The data stored 
remotely is M* = {M,{(T ijni ,W l )}\< l < n )- 

Challenge(?7;pk,5A:,ru) —> c. To audit t blocks of M, generate challenge c = 
{£, k\, ko, g s ), where k\ and ki are random /c-bit keys, and g s = g s mod N for random 

s r 

Proof (q,M*,c\pk) —> p. For 1 < j < £, generate indices ij = n^ij) and coeffi¬ 
cients cij = f k2 (j). Compute T = • ... • 7^ = (h(W h ) a ' • ... • h(W k ) a{ • 

gaimi^.-.+atmt^d mod N. Compute p = H(g a s imil+ - +aem H mo d N). The proof is 
P = (T,p). 

\lex\\y(c,p,q\pk,sk,co) —> b e {0,1}. Let r = T e . For 1 < j < t, compute ij = 
n k] (j),W lj = v\\ij,aj = f kl (j), and r = ^— mod N. If H(t s mod N) = p then 
return b = 1, else return b = 0. 


5 



2.2.3 CPOR 

The CPOR scheme is defined below, following the description and notation from Shacham 
and Waters [35], adapted slightly for uniformity with the other schemes in Section 2.2. 

Let / be a keyed pseudo-random function, as follows: 


/: { 0 , 1 }* xK prf ^Z p 


KeyGen(l^) —> ( pk,sk ). Choose a random key k enc <— 'K enc for symmetric encryption 
scheme Enc, and a random HMAC key k mac <— 'K inac . The secret key is sk = 
(kenc, kmac) and public key is pk =±. 

Tag (M; pk,sk,a>) —> M*. Given the file M, split M into n blocks, each .v sectors 

long: M = (mij)\<i< n . Choose a PRF key k pr f ‘K pr ( and 5 random num- 

i <j<s 

bers a\,...,a s Z p . Let r 0 = (n\\Enc kenc (k pr f\\ai\\ • • • ||ar,)>. The file tag is 
r = (ro11MAC k mac (To))- For each i, 1 < i < n, compute 

(Ti ^ f kprf (l) + Yj a j m ij 

j=1 


The data stored remotely is M* = ({m,j}, {<x,}). 

C\\3\\enQe{rj\pk,sk,co) —» c. Choose a random /’-element subset I c [1 ,n\. For each 
i e I choose random Vi Z p . Let c be the set {(i,i>j)}ie/- 
Proof pk) —> p. The prover parses c as {(/, and computes 

pj <- I Viiriij for 1 < j < s, and cr <— ^ 67 ay 

(■ i,vi)ec ( i,Vi)ec 


The proof is p = ( Pk,cr)i<k<s■ 

? 

\Zeniy(c,p,rj\pk,sk,a>) —> b e {0,1}. Checker = 
then return b = 1 , else return b = 0 . 


S 


Vi f kprf (i) + Z a jb j ■ 
(rn)ec 7=1 


If equal 


2.2.4 SEPDP 

The SEPDP scheme is defined below, following the description and notation from Ateniese 
et al. [34], adapted slightly for uniformity with the other schemes in Section 2.2. 
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Let t be the number of possible challenges, H be a cryptographic hash function, AE be an 
authenticated encryption scheme, / be a keyed pseudo-random function and n be a keyed 
pseudo-random permutation, defined as follows: 


H:{0,1}* ^{0,l} rf 
/:{0,l}*x{0,l} logW ^{0,1} L 
n : {0,1} L x {0, l} log(n) -» {0, l} log(n) 

KeyGen(l*) —> ( pk,sk ). Choose secret permutation key W <— {0,1 }*, master challenge 
nonce key Z <— {0,1}* and master encryption key K {0,1}*. The secret key 
sk = (W,Z,K ). The public key pk =_L. 

Tag (M;pk,sk,a>) —> M*. Divide message M into n blocks. Choose the number t of 
possible random challenges and the number £ of block indices per verification. For 
each 1 < i < t, generate the z'-th tag as: 

Generate a permutation key kj = fw(i) and nonce Cj = /z(0- 
Compute the set of indices {// e [l,n] | 1 < j < £} where ij = (j). 

Compute token Vj = //(q./zz,', ,... 

Encrypt the token cr ; <— AE^(z, iz ( ). 

The data stored remotely is M* = (}). 

Challenge(77;pk,5k,o>) —> c. Generate the z'-th challenge c = (ki,Cj) by recomputing 
k, = fwd) and q = f z (i). 

Prooi(i),M*,c;pk) —> p. Compute z = H (c ; -,m (1 ,... ,zzz, f ) where ij = n^U)- The proof 

is P = (Z,CTi ). 

\Zeniy(c,p,rj\pk,sk,a>) —> b e {0,1}. Compute v = AE^ (cr ( ). If v - ( i,z ) then return 
b = 1, else return b = 0. 


2.3 Cost Complexity 

The asymptotic communication complexity for each target PDP scheme is summarized 
in Table 2.1. While MAC-PDP affords a simple implementation, it is criticized for its 
relatively large communication complexity. Schemes like A-PDP, CPOR and SEPDP are 
designed with the goal of minimizing communication complexity [33]—[35]. 
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Table 2.1: Asymptotic communication complexity of MAC-PDP, A-PDP, 
CPOR and SEPDP. 



Challenge 

Proof 

MAC-PDP 

0(£ login)) 

0(£(bs + k)) 

A-PDP 

0(log(£ + 2 k + log(A)) 

0(\og(N)) 

CPOR 

0{£ + (log(n) + d)) 

0(\og(p)) 

SEPDP 

O(L) 

0(d + L) 


The block size bs is a function of file size and n, the number of file blocks. 


2.4 Detection Probability 

It is not the objective of this study to compare proofs associated with PDP schemes. To 
compare the cost of each scheme does require, however, selection of comparable param¬ 
eters. There are at least three senses in which PDP schemes might be considered to be 
comparable. 

Strength of Security. For a scheme, this is expressed as Pr [forge], the probability that a 
prover can get the verifier to accept a forged proof as valid (i.e., when it was com¬ 
puted without using some blocks involved in the challenge). 

Strength of Audit. For a scheme, this is expressed as Pr [audit], the probability that a 
single audit will appear to succeed even when k of n blocks have been deleted. For 
many schemes, this is a combinatorial argument based on the probability that the £ 
random challenge indices are among the k blocks deleted. 

Efficiency of Recovery. Some PDP schemes, often called proof of retrievability (POR) 
schemes, have the additional characteristic that the original file can be recovered 
even after some number of failed audits. For such a scheme, this is expressed as 
Pr [recover], the probability of retrieval after an e fraction of audits have failed. 

Comparison across schemes in these senses is problematic for a number of reasons: (i) 
schemes rely on different primitives (full-domain hash functions, authenticated encryption 
schemes, pseudorandom permutations) making parameter selection to achieve compara¬ 
ble Pr [forge] difficult; (ii) schemes have expressed these properties in slightly different 
adversarial models and employing slightly different arguments; (iii) arguments have been 
expressed in asymptotic terms rather than concrete terms, making parameter derivation dif- 
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ficult, especially when arguments employ bounds that are known to not be tight. Thus we 
do not select parameters in this study with the objective of providing absolute apples-to- 
apples comparison across schemes. Since the simple combinatorial arguments employed 
for Pr [audit] tend to be most reusable, we prioritize parameter selection for comparability 
in this sense. In some sense, this is a rather insignificant parameter since its probability can 
be driven arbitrarily low through repeated audits, due to exponential hardness amplification 
of passing a series of audits. At the same time, selection of this parameter may be most 
directly related to deriving policy on how often one performs audits. As we are interested 
in the recurring cost of audit, it is a natural parameter of our study to consider carefully. 
We leave open for future work parameter selection to facilitate fair comparison in terms of 
Pr [forge] andPr [recover]. 
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CHAPTER 3: 
Methodology 


This chapter discusses our experimental environment, methodology for how timing data is 
gathered, and implementation decisions for evaluating the performance of the PDP schemes 
under evaluation. 


3.1 Experiment Environment 

Our PDP experiments can be divided into two phases: a set-up phase and an audit phase. In 
the set-up phase, the client generates keys ( pk,sk ), generates a tagged file M*, and sends 
M* to remote storage (see Figure 3.1a). For the audit phase, the client generates a challenge 
c and sends it to the prover; the prover responds with a proof p, which is sent to the client; 
the client verifies the proof and indicates success or failure (see Figure 3.1b). 


client generates keys and 



auditor generates prover responds with 

challenge c proof p 



Client storage Remote stora g e 


(a) Set-up phase of PDP protocol 


(b) Audit phase of PDP protocol 


Figure 3.1: Set-up and audit phases of PDP experiment. 


Adapted from [33]: G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peter¬ 
son, and D. Song, “Provable data possession at untrusted stores,” in Proceedings of the 
14th ACM Conference on Computer and Communications Security, 2007, pp. 598-609. 
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3.2 Measurements and Costs 

It is important to define what system costs are measured in each of our experiments. We 
depict what operations are included in each of our measurements in Figure 3.2. Generally, 
we ignore costs associated with transfer time and service latency, focusing on significant, 
recurring computational costs. 


CPS 


ignored 


^ KeyGen 

tag time 


Tag L set ‘ u P 

phase 

r 

ignored 

~ ~ -- ^2!SJ^8ged^]Q __ 

challenge time 


Challenge 

r 

ignored 


proof time 

audit 

r p hase 

Proof 

ignored 


verify time 


Verify 

time 



Figure 3.2: Timing measurement definitions, highlighting what operations 
and costs are included in each measurement. 


In the set-up phase we do not measure the cost of generating keys ( pk , sk ). During tagging 
data, we ignore the cost of sending the file and tag data M* to the storage server S. In 
the audit phase, we ignore the transfer time involved in sending the challenge to prover P 
and in returning the proof to client C. For proof generation, however, we include the time 
associated with retrieving challenge blocks from local or remote storage, including this as 
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part of the proof time. We believe the cost associated with parsing the challenge, retrieving 
the data required for the proof, and the cost of generating the proof itself are intimately 
related, and we combine these in our measurement. 

3.3 Implementation 

Our benchmark test is a single-threaded application written in C using the libpdp li¬ 
brary [39], an open-source C library providing implementations for MAC-PDP, A-PDP, 
CPOR, and SEPDP. In all experiments, our benchmark application is run on Amazon Elas¬ 
tic Cloud (EC2). The client, auditor and prover are each run on the same EC2 instance: an 
c3.xlarge instance, running 64-bit Ubuntu Server 14.04 LTS using HVM virtualization. In 
other environments, these three parties might be separate hosts or owned by separate orga¬ 
nizations (i.e., tagging and ingest performed by the data owner, and auditing performed by 
a third-party). As we have chosen to define tag, challenge and verify timing measurements, 
the properties of the network connecting these parties are irrelevant to our measurements 
and so we elect to run these parties on the same host. For each of our schemes, we conduct 
two types of benchmarks: using local data storage and using remote data storage. For lo¬ 
cal storage experiments, M* is stored at the EC2 instance’s local storage. For the remote 
storage experiments, M* is stored to an Amazon S3 bucket. 

Table 3.1: Default benchmark parameters used in our experiments. 

MAC-PDP £ = 460, k mac = 20 bytes 

A-PDP £ = 460, N = 1024 bits, PRP k { = 16 bytes, PRF k 2 = 20 
bytes 

CPOR £ = 460, k enc = 32 bytes, k pr f = 20, k mac = 20 bytes, A = 80, 
p = 80 bits, sector size = 9 bytes 

SEPDP £ = 460, AE K = 16 bytes, PRP W,Z= 16 bytes, PRF k t = 

20 bytes, t = 1 

Unless otherwise noted, bs = 4096 bytes and fs = 2 25 bytes. 

Experiments are run sequentially, each time doubling block size or file size for a particular 
scheme. Pre-experiment trials in which the order of experiments are randomized demon¬ 
strated no discernible impact to our results; thus, we strongly believe our trials are in¬ 
dependent and order of test execution had no impact to our results. Each experiment is 
performed using pre-generated, random input file data. Every experiment is repeated three 
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times (graphs in Chapter 4 show raw data from all three iterations). The default parameters 
used for each scheme is provided in Table 3.1. 
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CHAPTER 4: 
Analysis 


In this chapter, we analyze the timing data collected for each of the five major PDP algo¬ 
rithms: KeyGen, Tag, Challenge, Proof and Verify. Each algorithm is analyzed separately 
across all four schemes, including our expectations based on each algorithm, what the data 
actually show, and the cost model we have developed for each scheme and algorithm. 

For each cost model, we employ the following notation: 

bs, block size in bytes 

fs, file size in bytes 

ss, sector size in bytes 

co, ci,..., model-specific constants. 

For all the schemes, fs/bs yields the number of blocks in the file M. In each experiment, 
there is a point where the file size and block size are such that the total number of blocks 
falls below the default number of challenges selected for an audit. At this point, fewer 
computations are performed, resulting in faster algorithm times. Otherwise, all schemes 
approach some threshold where proof cost becomes constant. All model-specific constants 
are derived experimentally using least-squares approximation. Unless otherwise noted, all 
figure times are in seconds. 


4.1 Tag File 

In our experiments, there is no theoretical difference between running the Tag algorithm 
with local data or using AWS S3. Our measurements also bear this out. 

4.1.1 MAC-PDP 

We observe that when block size is held constant and file size increases, the tag time in¬ 
creases linearly (see Figures 4.1a and 4.2a). When the file size remains constant and as the 
block size varies, the execution time is nearly constant (see Figures 4.1b and 4.2b). 

This is explained in terms of MAC-PDP generating tags via a hash-based MAC on every 
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Block size (bytes) 


(b) Block size vs. tag time 


Figure 4.1: File and block size vs. tag time for local data experiments. 




Block size (bytes) 


(a) File size vs. tag time 


(b) Block size vs. tag time 


Figure 4.2: File and block size vs. tag time for S3 data experiments. 


file block. Since the hash algorithm generates a digest through repeated operations on 
fixed-size blocks, the operation time should be proportional to the size of the input. We 
summarize these trends in Model 4.1, which expresses the tag time as proportional to the 
file size. 

co + ci-fs (4.1) 
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4.1.2 A-PDP 

We observe that when block size is held constant and file size increases, the tag time in¬ 
creases linearly (see Figures 4.1a and 4.2a). When the file size is held constant and the 
block size increases, the tag time decreases linearly (see Figures 4.1b and 4.2b). 

This is explained in terms of A-PDP generating tags through modular exponentiation on 
every block. As the file size grows, there will be more blocks to tag, resulting in increased 
execution time. As block size increases, there is a corresponding decrease in the number 
of blocks to tag. We summarize these trends in Model 4.2, which expresses the tag time as 
proportional to the file size and inversely proportional to the block size. 


Co + Cl • fs/bs + C2 ■ bs + C3 • fs 


(4.2) 


4.1.3 CPOR 

We observe that when block size is held constant and file size increases, the tag time in¬ 
creases linearly (see Figures 4.1a and 4.2a). When the file size is held constant and the 
block size increases, the tag time remains constant (see Figures 4. lb and 4.2b). 

This is explained in terms of CPOR generating tags through nested loops of modular mul¬ 
tiplication and addition. The number of loops is determined by the total number of sectors. 
An increase in file size results in a corresponding increase in the number of sectors. How¬ 
ever, since changes in block size have little to no effect on the number of sectors, the al¬ 
gorithm times remain nearly constant as the block size varies. We summarize these trends 
in Model 4.3, which expresses the tag time as proportional to the file size and inversely 
proportional to the sector size. 


Co + Cl • fs + c 2 ■ fs/ss 


(4.3) 
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4.1.4 SEPDP 

We observe that when block size is held constant and file size increases, the tag time in¬ 
creases linearly up to a point, after which the tag time remains constant (see Figures 4.1a 
and 4.2a). When the file size is held constant and the block size increases, the tag time 
increases linearly up to a point, after which the tag time remains constant (see Figures 4.1b 
and 4.2b). 

This is explained in terms of SEPDP generating tokens by calculating the hash of a spec¬ 
ified number of blocks. The tag time, then, is proportional to the number of bytes being 
processed, which is determined by the number of blocks per token and the block size. The 
number of blocks per token is defined by the default security parameter £, unless the block 
and file sizes are such that there are fewer blocks than the default parameter, in which case 
the token consists of all the blocks in the file. We summarize these trends in Model 4.4, 
which expresses the tag time as proportional to the total number of bytes processed per 
token. 


(co + c i • min((min (fs/bs,£) ■ bs,fs )) • t (4.4) 

Above, min((min(/.s’//? I s’, / r ) • bs,fs) is essentially the number of bytes processed. When 
fs/bs < r, the entire file is processed to generate tokens. 


4.2 Generate Challenge 

In our experiments, there is no theoretical difference between running the Challenge algo¬ 
rithm with local data or using AWS S3. Our measurements and resultant models also bear 
this out. 


4.2.1 MAC-PDP 

We observe that when block size is held constant and file size increases, the challenge 
time runs in constant time up to a point, after which it runs in a slower constant time (see 
Figures 4.3a and 4.4a). When the file size is held constant and the block size increases, the 
challenge time runs in constant time up to a point, after which it runs in a faster constant 
time (see Figures 4.3b and 4.4b). 
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File size (kb) 



Block size (bytes) 


(a) File size vs. challenge time (b) Block size vs. challenge time 

Figure 4.3: File and block size vs. generate challenge time for local data 
experiments. 



File size (kb) 



Block size (bytes) 


(a) File size vs. challenge time (b) Block size vs. challenge time 

Figure 4.4: File and block size vs. generate challenge time for S3 data 
experiments. 


This is explained in terms of l and the total number of file blocks, given by fs/bs. When 
there are fewer total blocks than €, then all indices are used for the challenge. However, 
when there are more blocks than l, then the challenge indices must be chosen without re¬ 
placement, which still runs in constant time, but takes longer than simply using all available 
indices. We summarize these trends in Model 4.5, which expresses the challenge time as 
one of two constants. 
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Ifs/bsl < e :c 0 
Ifs/bs] >l\c\ 


(4.5) 


4.2.2 A-PDP 

We observe that generate challenge runs in constant time regardless of file or block size (see 
Figures 4.3 and 4.4). This is explained in terms of the A-PDP challenge being independent 
of the file or block size. We summarize these trends in Model 4.6, which expresses the 
challenge time as constant. 


co 


(4.6) 


4.2.3 CPOR 

We observe that when block size is held constant and file size increases, the generate chal¬ 
lenge time increases linearly up to a point, after which it runs in constant time (see Fig¬ 
ures 4.3a and 4.4a). When the file size is held constant and the block size increases, the 
challenge time runs in constant time up to a point, after which it decreases linearly (see 
Figures 4.3b and 4.4b). 

This is explained in terms of CPOR generating a random ^-element set for the challenge. 
As the file size increases, the size of this set increases, until the number of blocks exceeds 
£. Similarly, when the block size increases to the point where there are fewer total blocks 
than C, then the size of the challenge set will begin to decrease. We summarize these trends 
in Model 4.7, which expresses the challenge time as either constant or proportional to the 
total number of blocks. 


lfs/bs~\ < £ : ci + C 2 ■ fs/bs 

r fs/bs]>£:c 0 (4.7) 
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4.2.4 SEPDP 


We observe that when block size is held constant and file size increases, generate challenge 
runs in constant time (see Figures 4.3a and 4.4a). As the file size is held constant and the 
block size increases, challenge runs in constant time up to a point, after which the run time 
is almost twice as slow (see Figures 4.3b and 4.4b). 

The former trend is explained in terms of SEPDP recomputing ki and Cj for the z'-th chal¬ 
lenge, neither of which is affected by the file size. We are unable to explain the latter trend. 
Nothing in the algorithm design suggests that block size should affect the run time, and we 
believe that the anomaly is an artifact of implementation, not a feature of the scheme. We 
summarize these trends in Cost Model 4.8, which expresses the challenge time as constant. 


co 


(4.8) 


4.3 Generate Proof 

In our experiments, there is a noticeable difference between timing for the Proof algorithm 
using local data storage compared to using remote data storage using AWS S3. We analyze 
these two sets of experiments, separately. 

For experiments interacting with S3, we observe that when block size is held constant and 
file size increases, the proof time increases linearly up to the point where the number of 
blocks exceeds £, after which the proof time is constant (see Figure 4.7a). When the file 
size is held constant and the block size increases, the proof time is nearly constant up to the 
point where £ exceeds the number of blocks, after which the proof time decreases linearly 
(see Figure 4.7b). 

This is explained in terms of each GET from S3 taking significantly more time than gener¬ 
ating the proof itself (see Figure 4.6). Thus, the number of GETs dominates the trend. For 
MAC-PDP, A-PDP, and CPOR there is one GET for each challenged block and one GET 
for each corresponding tag (see Figure 4.5). This is summarized in Equation 4.9, which 
expresses the number of GETs as twice the total number of blocks or twice £, whichever is 
less. 
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2 • min (fs/bs,£) 


(4.9) 


For SEPDP, there is one GET for each challenged block, but only one GET for the token 
corresponding to the z-th challenge (see Figure 4.5). This is summarized in Equation 4.10, 
which express the number of GETs as one more than the total number of blocks or one 
more than £, whichever is less. 


min(/ s lbs,I s ) + 1 


(4.10) 


4.3.1 MAC-PDP 

For local data experiments, we observe that when block size is held constant and file size 
increases, the proof time increases linearly up to the point where the number of blocks 
exceeds £, after which the proof time is nearly constant, increasing slightly as the file size 
grows (see Figure 4.6a). When the file size is held constant and the block size increases, 
the proof time increases linearly up to the point where £ exceeds the number of blocks, 
after which the proof time is constant (see Figure 4.6b). 

This is explained in terms of MAC-PDP generating a proof containing a message block 
and hash for each index in the challenge. The proof is dependent on the total number of 
bytes hashed. We summarize these trends in Model 4.11, which expresses the proof time 
as proportional to the total number of blocks, file size, and block size. 


\ fs/bs] < £ : co + c\ ■ fs/bs + C2 ■ bs + ■ fs 

\fs/bs~\>£:c4 + C5-bs (4.11) 
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Figure 4.5: File and block size vs. number of GETs from S3. 



(a) File size vs. proof time 



Figure 4.6: File and block size vs. generate proof time for local data experi¬ 
ments. 


4.3.2 A-PDP 

For local data experiments, we observe that when block size is held constant and file size 
increases, the proof time increases linearly up to the point where the number of blocks 
exceeds l, after which the proof time remains constant (see Figure 4.6a). When the file 
size is held constant and the block size increases, the proof time increases linearly (see 
Figure 4.6b). 

This is explained in terms of A-PDP generating proofs through modular exponentiation 
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Figure 4.7: File and block size vs. generate proof time for S3 data experi¬ 
ments. 


of 7 message blocks. Thus the proof time will depend on the total number of challenge 
blocks as well as the size of each block. We summarize these trends in Model 4.12, which 
expresses the proof time as proportional to the number of blocks, file size, and block size 
or proportional to just the block size. 


[fs/bs] < 7 : C 2 + C 3 • fs/bs + C 4 ■ bs + c$ ■ fs 
fs/bs] > 7 : cq + ci • bs (4.12) 


4.3.3 CPOR 

For local data experiments, we observe that when block size is held constant and file size 
increases, the proof time increases linearly up to the point where the number of blocks 
exceeds 7, after which the proof time remains constant (see Figure 4.6a). When the file size 
is held constant and the block size increases, the proof time increases linearly up the to the 
point where 7 exceeds the number of blocks, after which the proof time remains constant 
(see Figure 4.6b). 

This is explained in terms of CPOR generating the proof by computing /Uj and cr for each 
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of the indices in the challenge set. Additionally, /ry includes modular multiplication of all 
the sectors of each challenge block. Therefore, the proof time increases with the indices in 
the challenge set, as well as when the block size increases. We summarize these trends in 
Model 4.13, which expresses the proof time as proportional to the number of blocks, file 
size, and block size, or proportional to just the block size. 


\ fs/bs] < £ : cO + cl • fs/bs + c2 ■ bs + c3 • fs 

[fs/bs] > £ : c4 + c5 • bs (4.13) 


4.3.4 SEPDP 

For local data experiments, we observe that when block size is held constant and file size 
increases, the proof time increases linearly up to the point where the number of blocks 
exceeds £, after which the proof time remains constant (see Figure 4.6a). When the file size 
is held constant and the block size increases, the proof time increases linearly up the to the 
point where £ exceeds the number of blocks, after which the proof time remains constant 
(see Figure 4.6b). 

This is explained in terms of SEPDP generating the proof by computing the hash of all 
the message blocks for a particular token. The proof time is proportional, then, to the total 
number of bytes being hashed, given by the number of challenge blocks and block size. We 
summarize these trends in Model 4.14, which expresses the proof time as proportional to 
the total number of blocks, block size, and file size, or proportional to just the block size. 


[fs/bs] < £ : cO + cl • fs/bs + c2 ■ bs + c3 ■ fs 
lfs/bs] >£:c4 + c5-bs (4.14) 
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4.4 Verify Proof 

In our experiments, there is no theoretical difference between running the Verify algorithm 
with local data or using AWS S3. Our measurements and resultant models also bear this 
out. 

4.4.1 MAC-PDP 

We observe that when block size is held constant and file size increases, the verify time 
increases linearly up to the point where the number of challenge blocks exceeds £, after 
which it remains constant (see Figures 4.8a and 4.9a). When the file size is held constant 
and the block size increases, the verify time increases linearly up to the point where £ 
exceeds the total number of blocks, after which it remains constant (see Figures 4.8b and 
4.9b). 

This is explained in terms of MAC-PDP verifying a proof by hashing each index in the 
challenge. Therefore, the verify time is dependent on the total number of bytes hashed. We 
summarize these trends in Model 4.15, which expresses the proof time as proportional to 
the file size or proportional to the block size. 


\fs/bs\ < £ :c 0 + ci ■ fs 

[fs/bs] > £ : C 2 + C 3 • bs (4.15) 


4.4.2 A-PDP 

We observe that when block size is held constant and file size increases, the verify time 
increases linearly up to the point where the total number of blocks exceeds £, after which 
it runs in constant time (see Figures 4.8a and 4.9a). When the file size is held constant and 
the block size increases, the verify time remains constant up to the point where £ exceeds 
the total number of blocks, after which it decreases linearly (see Figures 4.8b and 4.9b). 

This is explained in terms of A-PDP verifying proofs by generating r and comparing the 
hash of r with p. Since r is computed by generating £ hashes, the algorithm time will 
be proportional to the total number of blocks that were challenged. We summarize these 
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Verify file Verify file 




Block size (bytes) 

(b) Block size vs. verify time 


Figure 4.8: File and block size vs. verify proof time for local data experiments. 




Figure 4.9: File and block size vs. verify proof time for S3 data experiments. 


trends in Model 4.16, which expresses the verify time as constant or proportional to the 
total number of blocks. 


Ifs/bsl <£ :c 0 

[ fs/bs~\ > £ : c\ + C 2 ■ fs/bs (4.16) 
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4.4.3 CPOR 


We observe that when block size is held constant and file size increases, the verify time 
increases linearly up to the point where the total number of blocks exceeds £, after which 
it runs in constant time (see Figures 4.8a and 4.9a). When the file size is held constant and 
the block size increases, the verify time increases linearly (see Figures 4.8b and 4.9b). 

This is explained in terms of CPOR verifying the proof by summing cry /Uj for all sectors 
of each block being challenged. As the file size grows, the number of sectors for each 
challenge increases. As the block size grows, the number of sectors per block increases. 
We summarize these trends in Model 4.17, which expresses the verify time as proportional 
to the number of blocks, file size, and block size, or proportional to just the block size. 


[fs/bs] < £ : co + ci • fs/bs + C 2 ■ bs + ■ fs 

[fs/bs] > £ : C 4 + C 5 • bs (4.17) 


4.4.4 SEPDP 

We observe that when block size is held constant and file size increases, the verify time 
remains constant (see Figures 4.8a and 4.9a). When the file size is held constant and the 
block size increases, the verify time remains constant up to a point, after which the verify 
time runs about twice as slow (see Figures 4.8b and 4.9b). 

This is explained in terms of the SEPDP verify algorithm decrypting cr, and comparing it 
with the proof. The decryption time should not be dependent on file size. Additionally, the 
decryption time should not be dependent on block size, and we believe that the anomaly is 
an artifact of implementation, not a feature of the scheme. We summarize these trends in 
Model 4.18, which expresses the verify time as a constant. 


co 


(4.18) 
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4.5 Total Cost 

We break costs down into three basic categories for analysis: (1) the cost to tag, which 
includes the computational costs to compute the tag and the PUT costs of uploading the 
tag; (2) the cost to store the tag; (3) the audit cost, which includes the computational cost to 
challenge, prove, and verify, and the GET costs associated with retrieving file blocks and 
tags during those operations. 

SEPDP is not depicted on the cost graphs because its use of audit tokens does not compare 
well with the other schemes. Whereas MAC-PDP, APDP, and CPOR all support an unlim¬ 
ited number of audits once the file is tagged, the number of audits for SEPDP is chosen in 
advance. Thus a total cost graph for SEPDP will depend on the desired frequency of audits 
before a file needs to be retagged. 

We note that the costs in our results should be thought of as minimal costs. We have 
ignored auditor costs associated with waiting for a response from the prover, as well as 
wake-up costs for the prover when it receives a proof request, which we do not measure 
as part of our experiments (see Figure 3.2). Measuring these costs would reflect network 
latency and implementation-specific details we do not believe to be strongly related to PDP 
Also, in a scaled implementation of PDP, where multiple audits are performed for clients, 
simultaneously, the downtime costs may not be consequential. Thus the basis costs we 
depict do not reflect actual costs, but can accurately reflect cost comparisons among the 
schemes. 

We chose to implement our benchmark tests on Amazon Web Services (AWS); however, 
there are several alternatives with comparable pricing schemes and storage options. For 
example, Microsoft Azure Blob storage, Google Cloud Storage, and Rackspace Cloud Files 
all have similar storage services and pricing schemes as Amazon. The AWS S3 storage 
pricing scheme is shown in Table 4.1 1 . 


Prices were obtained from https://aws.amazon.com/s3/pricing as of March 2016. 
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Table 4.1: Amazon Web Services S3 standard storage pricing scheme. 


Cost/GB 

First 1 TB / month 

$0.0300 

Next 49 TB / month 

$0.0295 

Next 450 TB / month 

$0.0290 

Next 500 TB / month 

$0.0285 

Next 4000 TB / month 

$0.0280 

Over 5000 TB / month 

$0.0275 


Table 4.2: Comparison of cloud providers remote storage limitations. 

Max object size Max PUT size Max metadata size 


Amazon S3 

5 TB 

5 GB 

2KB 

Microsoft Azure 

195 GB 

64 MB 

8 KB 

Google Cloud Storage 

5 TB 

5 TB 

unspecified 

Rackspace 

5 GB 

5 GB 

4 KB 


4.5.1 Tag Costs 

Tag costs consist of the cost to generate the tag and the PUT costs associated with uploading 
the file to storage (see Figure 4.10). These costs resemble the trends we observed for 
computational costs associated with generating a tag (see Figure 4.1a), with A-PDP being 
the most expensive, followed by CPOR, and MAC-PDP. The approximate basis costs to tag 
a file range from a fraction of a cent to $3 for a 1 GB file; $0.13 to $20 for a 1 TB file; and 
$135 to $20,400 for a 1 PB file. 



Figure 4.10: Cost to tag, based on tag algorithms and AWS EC2 pricing 
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4.5.2 Storage Costs 

We calculate the storage cost for each scheme (see Figure 4.12) based on their correspond¬ 
ing tag sizes (see Table 4.3). As the file size increases the tag file overhead increases 
linearly for MAC-PDP, A-PDP, and CPOR, but remains constant for SEPDP; however, as 
the block size increases, the tag file overhead decreases linearly for MAC-PDP, A-PDP, and 
CPOR, but increases linearly for SEPDP (see Figure 4.11). Since A-PDP has the largest 
tag size, it has the highest storage cost. MAC-PDP and CPOR have almost the same tag 
size and, therefore, very similar storage costs. 



File size (kb) 


(a) File size vs. overhead 



Figure 4.11: File and block size vs. tag file overhead. 


Table 4.3: Tag file overhead and tag size for each scheme (bs = 4096 bytes). 



Total tag file overhead (% fs) 

Tag size (bytes) 

A-PDP 

4.864% 

204 

MAC-PDP 

0.477% 

20 

CPOR 

0.429% 

18 


We investigated the option of storing tags as metadata to reduce cost; however, all the stor¬ 
age providers we reviewed included metadata as part of the overall file size. Additionally, 
at the time of publication, AWS S3 limits metadata storage to 2KB. The maximum file sizes 
at which the tags can be stored as metadata on AWS S3 are shown in Table 4.4. 
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Figure 4.12: Cost to store tag, based on scheme tag overhead and AWS S3 
pricing 

Table 4.4: Maximum file sizes at which tags can be stored as metadata on 
AWS S3. 


File size 

MAC-PDP 

428 kb 

A-PDP 

41 kb 

CPOR 

476 kb 

SEPDP 

0 kb 


4.5.3 Audit Costs 

We calculate the total audit cost (see Figure 4.13) by determining the number of GETs 
and computational cost to generate a challenge, generate a proof, and verify the proof. 
Since the proof time is significantly larger than the challenge or verify times (compare 
Figure 4.7 with Figures 4.4 and 4.9), we are not surprised to find the proof time dictates the 
audit cost trends. Additionally, the differences in proof times observable in the local data 
experiments (see Figure 4.6) nearly disappear in the S3 experiments due to the relatively 
larger times required to communicate with S3 and transfer proof data. As a consequence 
of the communication time common to all schemes, the audit costs are nearly identical for 
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MAC-PDP, A-PDP, and CPOR. 


It is worth noting that the audit cost for SEPDP is approximately half that of the three other 
schemes. The SEPDP proof scheme has fewer GETs than the other schemes since it only 
retrieves a single tag file in each audit, instead of a tag per challenge block, as in the other 
schemes. 



Figure 4.13: Cost to audit, based on audit cost models and AWS EC2 and 
S3 pricing 


4.5.4 Combined Cost Scenarios 

We observe that the monthly cost to store and audit once per hour is nearly identical for all 
schemes until the storage costs begin to dominate at larger file sizes, after which A-PDP 
becomes much more expensive than MAC-PDP and CPOR (see Figure 4.15). 

Since the audit costs are nearly identical for all three schemes, the tag and storage costs 
have the most significant impact on the total cost of each scheme. Figures 4.14a and 4.14b 
show the up-front cost to tag and cumulative cost storing and auditing a 1 GB and 1 TB 
file, respectively, at one audit per hour each month. For the 1 GB file, the tag and storage 
costs are less significant and the slightly higher audit cost of MAC-PDP can be observed 
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at one year of audits; however, the high tag and storage costs of the 1 TB file dominate, 
resulting in a higher cost for the A-PDP scheme. The following are approximate basis costs 
incorporating up-front cost to tag and cumulative cost storing and auditing at one audit per 
hour for one year: $160 to $175 for a 1 GB file; $170 to $230 for a 1 TB file; and $2,000 
to $38,700 for a 1 PB file. 




(a) Tag, storage, and audit costs for 1 GB file (b) Tag, storage, and audit costs for 1 TB file 
Figure 4.14: Cumulative tag, storage, and audit costs for one audit per hour. 




(a) File size vs. storage and audit costs (b) File size vs. storage and audit costs 

Figure 4.15: File size vs. storage and audit costs for files at one audit per 
hour for one month. 
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CHAPTER 5: 
Conclusion 


We have developed generic cost models for four PDP schemes, which can be used to in¬ 
fer future cost. Additionally, we have shown that audit costs of some sophisticated PDP 
schemes (A-PDP, CPOR) are nearly identical to those of the simple MAC-PDP scheme; 
whereas, tag and storage costs have a significant impact on total cost differences among 
the schemes. We conclude that the total cost of MAC-PDP and CPOR are comparable, 
whereas the cost of A-PDP becomes expensive relative to the other schemes at large file 
sizes. Our preliminary experimentation shows audit cost for SEPDP is about half the other 
schemes; however, the scheme is limited to a finite number of audits. 

From cost projections based on generic models for MAC-PDP, A-PDP, and CPOR, we find 
the basis cost for tagging is less than $1 for a 1 GB file; $0.13 to $20 for a 1 TB file; and 
$135 to $20,400 for a 1 PB file. The monthly basis cost for storage is a fraction of a cent 
for a 1 GB file; $0.13 to $1.50 for a 1 TB file; and $130 to $1,500 for a 1 PB file. The 
cost for a single audit is approximately $0.02 for files larger than 2 MB. Combined cost 
projections incorporating up-front cost to tag and cumulative cost storing and auditing at 
one audit per hour for one year show basis costs of $160 to $175 for a 1 GB file; $170 to 
$230 for a 1 TB file; and $2,000 to $38,700 for a 1 PB file. 


5.1 Future Work 

While our benchmark tests covered a limited number and type of PDP implementations, 
future studies could compare schemes that incorporate erasure codes, dynamic data, or 
distributed file system storage, among other variants. Our experiments ignored costs as¬ 
sociated with transfer time and service latency, focusing instead on computational costs. 
Follow-on work could separate the client, auditor, and prover in order to measure the 
communication costs between each entity. Lastly, follow-on work could compare costs 
choosing different security parameters. In our experiments, we selected security param¬ 
eters designed to normalize comparison in terms of the strength of audit (as defined in 
Chapter 2). Future work could select parameters to facilitate scheme comparison in terms 
of other properties, such as strength of security and efficiency of recovery. 
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